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Abstract. The areal modeling of the extremes of a natural process such 
as rainfall or temperature is important in environmental statistics; for 
example, understanding extreme areal rainfall is crucial in flood pro- 
tection. This article reviews recent progress in the statistical modeling 
of spatial extremes, starting with sketches of the necessary elements of 
extreme value statistics and geostatistics. The main types of statistical 
models thus far proposed, based on latent variables, on copulas and 
on spatial max-stable processes, are described and then are compared 
by application to a data set on rainfall in Switzerland. Whereas latent 
variable modeling allows a better fit to marginal distributions, it fits 
the joint distributions of extremes poorly, so appropriately-chosen cop- 
ula or max-stable models seem essential for successful spatial modeling 
of extremes. 

Key words and phrases: Annual maximum analysis, Bayesian hierar- 
chical model, Brown-Resnick process, composite likelihood, copula, en- 
vironmental data analysis, Gaussian process, generalized extreme- value 
distribution, geostatistics, latent variable, max-stable process, statistics 
of extremes. 



1. INTRODUCTION 

Natural hazards such as heat waves, high rain- 
fall and snowfall, tides and windstorms, arise due 
to physical processes and are spatial in extent. Al- 
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though it is difficult to attribute a particular event, 
such as Hurricane Katrina or the 2010 flooding in 
Pakistan, to the effects of climate change, both ob- 
servational data and computer climate models sug- 
gest that the occurrence and sizes of such catastro- 
phes will increase in the future. The potential con- 
sequences include increases in severe windstorms, 
flooding, wildfires, crop failure, population displace- 
ments and increased mortality. Apart from their di- 
rect impacts, such events will also have indirect ef- 
fects such as increased costs for strengthening in- 
frastructure and higher insurance premiums. There 
is thus a pressing need for a better understanding 
of spatial extremes and more detailed assessment of 
their consequences, and over the last few years the 
topic has become an active interface between cli- 
mate, social and statistical scientists, in interaction 
with stakeholders such as insurance companies and 
public health officials. A particular issue when deal- 
ing with extremes is that although vast amounts of 
data may be available — though of varying quality 
and homogeneity — rare events are necessarily un- 
usual and so the quantity of directly relevant data 
is limited. This difficulty is compounded in the spa- 
tial setting, because forecasting then entails extrap- 
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Fig. 1. Map of Switzerland showing the stations of the 51 rainfall gauges used for the analysis, with an insert showing the 
altitude. The 36 stations marked by circles were used to fit the models, and those marked with squares were used to validate 
the models. Data for the pairs of stations with blue symbols appear in Figure 2. 



olation into a high-dimensional space, with all its 
attendant uncertainties. It is thus important that 
the statistical models used should both be flexible 
and have strong mathematical foundations, so that 
such extrapolation has an adequate basis. These re- 
quirements suggest the use of statistics of extremes, 
as sketched below. 

A variety of statistical tools have been used for the 
spatial modeling of extremes, including Bayesian hi- 
erarchical models, copulas and max-stable random 
fields. The purpose of this paper is to review and to 
compare these approaches in the practical context of 
modeling rainfall, with the twin goals of elucidating 
their properties and of contrasting them in a con- 
crete context. To do this, we use summer maximum 
daily rainfall for the years 1962-2008 at 51 weather 
stations in the Plateau region of Switzerland, pro- 
vided by the national meteorological service, Me- 
teoSuisse. The stations lie north of the Alps and 
east of the Jura mountains, the largest and small- 
est distances between them being around 85 km and 
just over 3 km respectively. We randomly chose 35 



stations to fit our models, and use the remaining 
16 to validate them, as described below. The max- 
imum and minimum distances between fitting and 
validation stations are very similar to those for all 
51 stations. Their locations are shown in Figure 1; 
the region is relatively flat, the altitudes of the sta- 
tions varying from 322 to 910 meters above mean 
sea level. Figure 2 shows the annual maxima and 
the maxima for the summer months, June- August, 
and for the winter months, December-February, for 
four pairs of stations marked in blue in Figure 1. As 
one would expect, there is a clear correlation among 
the maxima at these relatively short distances, and 
this must be reflected in the models if risk is to be 
accurately assessed. 

In Section 2 we provide an overview of the parts 
of statistics of extremes that are needed later, and 
Section 3 provides a similar sketch of geostatistics. 
Subsequent sections describe latent variable, copula 
and max-stable approaches to the spatial modeling 
of extremes, which are then compared in Section 7. 
The paper ends with a brief discussion. 
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Fig. 2. Annual, summer and winter maximum daily rainfall values for 1962-2008 at the four pairs of stations shown in blue 
in Figure 1. In each case the black line represents the station to the east and the red dashed line that to the west. 



2. STATISTICS OF EXTREMES 
2.1 General 

Statistics of extremes has grown into a vast field 
with many domains of application. Systematic math- 
ematical accounts are given by Resnick (1987, 2007) 
and de Haan and Ferreira (2006), while more sta- 
tistical treatments may be found in Beirlant et al. 
(2004), Coles (2001) and Embrechts, Kliippelberg 
and Mikosch (1997), the last focusing particularly 
on finance. Further reviews are provided by Kotz 
and Nadarajah (2000) and Finkenstadt and Rootzen 
(2004). A key issue in applications is that inferences 
may be required well beyond the observed tail of 
the data, and so an assumption of stability is re- 
quired: mathematical regularities in the unobserv- 
able tail of the distribution are assumed to reach 
far enough back into the observable region that ex- 
trapolation may be based on a model fitted to the 
observed events. This requires an act of faith that 
the mathematics of regular variation, which under- 
pins the extrapolation, is applicable in the practical 



circumstances in which the theory is applied. A sta- 
tistical consequence of the lack of data is that tail 
inferences tend to be highly uncertain, and that the 
uncertainty can increase sharply as one moves fur- 
ther into the tail. In applications this can lead to 
alarmingly wide confidence intervals, but this seems 
to be intrinsic to the problem. 

2.2 Univariate Models 

Statistical modeling of extremes may be based on 
limiting families of distributions for maxima that 
satisfy the property of max-stability. At its sim- 
plest we take independent continuous scalar random 

variables X±, . . . , X m ~ F, where the distribution F 
has upper terminal xf = sup{x : F(x) < 1}, and ask 
whether there exist sequences of constants {a m } > 
and {b m } such that the rescaled variables 

(1) a^ l 1 {max(X 1 ,...,X m ) -b m } 

have a nondegenerate limiting distribution G as m — > 
oo. It turns out that if such a G exists, then it must 
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be max-stable, that is, it must satisfy the equation 

(2) G m (b' m + a' m y) = G(y), y G M, m G N, 

for sequences {a' m } > and {b' m }. The only nonde- 
generate distribution with this property is the gen- 
eralized extreme-value (GEV) distribution 

(3) H ( y) = hxp[-{i + Z(y-r))/T}+ 1 % e/o, 

\exp[-exp{-(y-7/)/r}], £ = 0, 

where «+ denotes max(it, 0). The quantities rj and r 
in (3) are respectively a real location parameter and 
a positive scale parameter; £ determines the weight 
of the upper tail of the density, with £ < corre- 
sponding to the reverse Weibull case in which the 
support of the density has a finite upper bound, 
£ = corresponding to the light-tailed Gumbel dis- 
tribution, and £ > corresponding to the heavy- 
tailed Frechet distribution. The rth moment of H 
exists only if r£ < 1. 

Expression (3) is the broadest class of nondegener- 
ate limit laws for a maximum Y of a random sample 
of continuous scalar random variables, but in mul- 
tivariate and spatial settings it is simpler to employ 
mathematically equivalent expressions that result 
from considering the transformed random variable 
Z = {1 + £(V — rj)/r} 1 ^ , which has a unit Frechet 
distribution exp(— 1/z), for z > 0. In this case the 

max-stability property may be written as mZ = 
max(Zi, . . . , Z m ), where Z,Z\,..., Z m represent mu- 
tually independent unit Frechet random variables 

and = denotes equality in distribution. This trans- 
formation has the effect of separating the marginal 
GEV distributions of the variables from their joint 
dependence structure, and this is often convenient. 

A typical goal in applications is the estimation 
of a high quantile of the distribution of Y, that is, 
a solution of the equation H(y p ) = p; for £ / this is 



Vp = V + T{(-logp)" 



1}, 0<p<l, 



with the limit £ — > yielding y p = rj — r log(— logp). 
If the available observations Yj are annual maxima 
and we set p = 1 — 1/T, then y p is called the T-year 
return level, interpreted as the level exceeded once 
on average every T years. Engineering requirements 
may be expressed in terms of T or y p . For exam- 
ple, the Dutch Delta Commission, responsible for 
protection against sea- and river-water flooding, set 
a risk level for sea flooding of North and South Hol- 
land that corresponds to a 10,000-year return level, 
and a risk level for river flooding that corresponds 
to a 1 250-year return level, though their physical in- 



terpretations in a nonstationary world are unclear. 
Estimates of y v are highly sensitive to £, and, if pos- 
sible, it is helpful to pool information about this pa- 
rameter. 

Under mild conditions on the dependence struc- 
ture of stationary time series, the GEV also emerges 
as the only possible nondegenerate limiting distribu- 
tion for linearly renormalized maxima of blocks of 
observations, and this greatly widens its range of 
application; see Leadbetter, Lindgren and Rootzen 
(1983). In typical applications rare events occur in 
clusters whose mean size 6~ l is determined by the 
so-called extremal index, 6 G (0,1]. Block maxima 
then have the GEV distribution H(y) s , but the intra- 
cluster distribution may take essentially any form. 

The discussion leading to (3) implies that for 
large m, F(b m + a m y) m ph H(y), and, therefore, (2) 
implies that for large enough x, 

F(x) « H l / m {(x - b m )/a m } « H(x) 

for some choice of the parameters rj, r and £. Thus, 
although the generalized extreme-value distribu- 
tion (3) arises as the natural probability law for 
maxima of m independent variables, it may also be 
regarded as giving an approximation for the upper 
tail of the distribution of an individual variable, pro- 
vided a limiting distribution for maxima exists. For 
a high value u < xp and x satisfying u <x + u < xf, 
we therefore have 



(4) 



pr(A" > x + u | X > u) 
1 - H(x + u) 



1 



H{u) 

^{l + ix/a u )~^\ x>0, 

where a u = r + £(u — rj) . The last expression in (4) is 
the survivor function of the generalized Pareto dis- 
tribution (GPD), which is commonly used for mod- 
eling exceedances over high thresholds (Davison and 
Smith (1990)). The standard approach to such mod- 
eling presupposes that the times of exceedances over 
the high threshold u are the realization of a station- 
ary Poisson process of rate A, say, and that their 
sizes are independent with survivor function (4). 
This model may also be formulated in terms of a lim- 
iting Poisson process of extremes (Smith (1989)). 

2.3 Multivariate Models 

We now consider componentwise maxima of an 
independent sequence of bivariate random variables 
{X\i,X2i), for i = l,.... If nondegenerate limiting 
marginal distributions exist, these must be of the 
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form (3), and, hence, the rescaled limiting versions 
of the componentwise maxima max(Xn, . . . , X\ n ) 
and max(X2i, . . . , X 2n ) may be transformed to have 
marginal unit Frechet distributions. It turns out that 
if it exists and is nondegenerate, then the limiting 
joint distribution of the transformed componentwise 
maxima can be written as 

pr(Zi <z 1 ,Z 2 <z 2 ) 

(5) 

= exp{-V(z 1 ,z 2 )}, zi,z 2 >0, 

where the exponent measure V(z\,z 2 ) (Resnick 
(1987), page 268) satisfies 

V(z±,oo) = 1/zi, V(oo,z 2 ) = l/z 2 , 

(6) 

V{tz 1 ,tz 2 )=t- 1 V(z 1 ,z 2 ), t>0. 

Here the first two properties ensure that the margin- 
al distributions are unit Frechet, and the third shows 
that the function V is homogeneous of order — 1, 
thereby extending the max-stability property to the 
bivariate case. This argument extends to multivari- 
ate extremes, for which the corresponding function 
V{z\,...,zd) satisfies the analogues of (6). Two 
bounding cases are where Z\, . . . ,Zjj are indepen- 
dent or are entirely dependent, corresponding re- 
spectively to 

V(zi, ...,z D ) = l/zi H h l/z D , 

V(z 1 ,...,z D ) = l/min(z 1 ,...,z D ). 

A consequence of the homogeneity of V is that 
multivariate extreme-value distributions have vari- 
ous so-called spectral representations, of which the 
best-known, due to Pickands (1981), rewrites the ex- 
ponent measure as 

V{zi,...,z D ) 

(7) = / max(wi/zi,...,w D /z D ) 

Js D 

dM(wi,. . . ,w D ), 

where M is a measure on the D-dimensional sim- 
plex So- On setting all but one of the Zd equal to 
+oo, we see that in order for the distribution to have 
unit Frechet margins, M must satisfy the constraint 
f Wd dM{w\ , . . . , wd) = 1 for each d. Unlike for uni- 
variate extremes, there is no simple parametric form 
for the multivariate limiting distribution; V can take 
any form subject to (6). From a statistical viewpoint 
this is a mixed blessing. Although numerous para- 
metric forms for V or equivalent functions have been 



proposed (Kotz and Nadarajah (2000), Section 3.5), 
those in current use tend to be somewhat inflexi- 
ble, and, owing to the curse of dimensionality, non- 
parametric estimation has essentially been confined 
to the bivariate case (Fougeres (2004); Boldi and 
Davison (2007); Einmahl and Segers (2009)). More 
positively, we may use the flexibility to construct 
functions V adapted to specific applications. 

A difficulty for statistical inference arises because 
equations such as (5) specify cumulative distribution 
functions. The likelihood function for D-dimensional 
data involves differentiation of exp{— V(z±, . . . , zd)} 
with respect to z±, . . . , Zd, resulting in a combinato- 
rial explosion; the number of terms is the number 
of partitions of the integer D. Even for only ten 
dimensions, D = 10, a single likelihood evaluation 
would involve a sum of over 100,000 different terms, 
which seems infeasible in general, though there may 
be simplifications in special cases. 

2.4 Extremal Coefficient 

It is useful to have summary measures of extremal 
dependence. One possibility is based on the prob- 
ability that all the transformed variables are less 
than z, 

pr(Zi <z,...,Z D <z) 

(8) =exp{-U(l,...,l)/z} 

= exp(— 9t>/z), z>0, 

owing to the homogeneity of V. The quantity 9x>, 
known as the extremal coefficient of the observations 
Zd, d&D = {1, . . . , D}, varies from 9x> = 1 when the 
observations are fully dependent to 9x> = D when 
they are independent, and thus provides a summary 
of the degree of dependence, though it does not de- 
termine the joint distribution. In the bivariate case 
it is easy to check that 

lim pr(^2 > z | Z\ > z) = 2 — 9t>, 

thereby providing an interpretation of 9t> in terms 
of the limiting probability of an extreme event in 
one variable, given a correspondingly rare event in 
the other. Thus, if 9x> = 2, this probability is zero, 
while smaller values of 9x> will yield larger condi- 
tional probabilities. 

Schlather and Tawn (2003) discuss the consistency 
properties that must be satisfied by the extremal co- 
efficients of subsets of Zi, . . . , Zd, and suggest how 
these coefficients may be estimated. Below we com- 
pare purely empirical estimators for pairs of sites 
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with the fitted versions found from models, so we 
need to estimate 6x> for D = 2. In our experience 
madogram estimators perform well, and we use these 
below. The F-madogram is defined as (Cooley, 
Naveau and Poncet (2006)) 

(9) v F = ±E{\F(Z 1 )-F(Z 2 )\}, 

where F{z) = exp(— 1/z). Unlike the more common 
variogram (Schabenberger and Gotway (2005), Chap- 
ter 4), (9) remains finite when the margins of the 
process are heavy tailed, because ~E{F k (Z\)} = 1/ 
(1 + k), for k > 0, and it has a bijective relationship 
with the extremal coefficient 6 = (1 + 2vp)/(l — 2vp). 
Cooley, Naveau and Poncet (2006) discuss estima- 
tion of the extremal coefficient based on the mado- 
gram, which is extended by Naveau et al. (2009) to 
the setting in which maxima of a stationary pro- 
cess are observed at many points in space and it 
is required to estimate the extremal coefficient as 
a function of the distance between them. 

3. GEOSTATISTICS 

3.1 Generalities 

Geostatistics is a large and rapidly developing do- 
main of statistics, with important applications in 
areas such as public health, agriculture and resource 
exploration, and in environmental and ecological 
studies. Standard texts are Cressie (1993), Stein 
(1999), Wackernagel (2003), Banerjee, Carlin and 
Gelfand (2004), Schabenberger and Gotway (2005) 
and Diggle and Ribeiro (2007). There are three com- 
mon data types: spatial point processes, used to 
model data whose observation sites may be treated 
as random; areal data, available at a set of sites for 
which interpolation may be uninterpretable, such as 
climate model output; and point-referenced or geo- 
statistical data, which may be modeled as values 
from a spatial process defined on the continuum but 
observed only at fixed sites, between which interpo- 
lation makes sense. 

Here we are concerned with point-referenced data, 
for which a suitable mathematical model is a ran- 
dom process {Y(x)} defined at all points x of a spa- 
tial domain X, typically taken to be a contiguous 
subset of ffi 2 . Examples are levels of air pollution 
or annual maximum temperatures observed at a fi- 
nite subset V = {xi, . . . , Xf)} of sites of X. The sta- 
tistical problem is to make inference for the pro- 
cess elsewhere in X . Having observed daily rainfall 
depths Y(xi), . . . ,Y(xd) at a set of weather sta- 
tions, for example, we may wish to predict Y(x) 



at an unobserved site x, estimate the highest depth 
sup x£X Y(x) in the region, or provide a distribution 
for a quantity such as J xeX Y(x) dx. Below we sketch 
elements of geostatistics needed subsequently, leav- 
ing the interested reader to consult the references 
above for further details. 

3.2 Gaussian Processes 

The simplest and best-explored approach to mod- 
eling point-referenced data is to suppose that {Y (x)} 
follows a Gaussian process defined on X . Such a pro- 
cess is called intrinsically stationary if, in addition to 
its finite-dimensional distributions being Gaussian, 
its increments are stationary, that is, the process 
{Y(x + h) — Y(x) :x G X} is stationary for all lag 
vectors h. Then we take E{Y(x + h) — Y(x)} = 0, 
and there exists a function 

7 (h) = ±vax{Y(x + h) -Y(x)}, x,x + heX, 

called the semi variogram; this need not be bounded. 
A stronger assumption is that of second-order sta- 
tionarity, meaning that var{Y(x)} is a finite con- 
stant for x G X and that the covariance function 
cov{y(xi), Y(x2)} exists and may be expressed as 
C{x\ — X2), where C(-) is a positive definite func- 
tion. In this case we may write 7(/i) = C(0) — C(h), 
and we see that j(h) is bounded above by C(0) = 
var{y(x)} and that p(h) =C(h)/C(0) is a correla- 
tion function. For Gaussian processes second-order 
stationarity is equivalent to stationarity, under which 
the joint distribution of any finite subset of points 
of Y(x) depends only on the vectors between their 
sites. 

Gneiting, Sasvari and Schlather (2001) discuss the 
relationships between semivariograms and covari- 
ance functions: in particular, a real function on M 2 
satisfying 7(0) = is the semivariogram of an intrin- 
sically stationary process if and only if it is condi- 
tionally negative definite, that is, 

n 

(10) ^2 ai(ijj(xi - Xj) < 

for all finite sets of sites x\, . . . , x n in X and for all 
sets of real numbers a±, . . . ,a n summing to zero, or, 
equivalently, if exp{— £7(^1)} is a covariance func- 
tion for all t > 0. Clearly, a semivariogram or co- 
variance function valid in MP is also valid in lower- 
dimensional spaces, though the converse is false. 

A covariance function or, equivalently, a semivar- 
iogram is called isotropic if it depends only on the 
length \\xi — X2W of X1—X2 and not on its orientation; 
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Table 1 

Parametric families of isotropic correlation functions. Here K K denotes the modified 
Bessel function of order k and T(u) denotes the gamma function. In each case A > 



Family Correlation function Range of validity 



Whittle-Matern 


p(h) = 


= {2 K - 


1 rM}- 1 (||/ l ||/A)^ K (||/ l ||/A) 


K>0 


Cauchy 


p(h) = 


= {1 + 


(Hl/A) 2 }"* 




Stable 


p(h)~- 


= exp{ 


-(NAn 


0<k<2 


Exponential 


p(h)~- 


= exp(- 


-\\h\\/X) 





this typically unrealistic but very convenient model- 
ing assumption imposes additional restrictions 
on 7(/i). 

Schabenberger and Gotway [(2005), Section 4.3] 
and Banerjee, Carlin and Gelfand [(2004), Sec- 
tion 2.1] describe a variety of valid correlation func- 
tions. Isotropic forms for those used in this paper are 
summarized in Table 1, where A represents a pos- 
itive scale parameter with the dimensions of dis- 
tance, and k is a shape parameter that controls the 
properties of the random process and, in particular, 
can determine the roughness of its realizations. The 
Whittle-Matern family is flexible and widely used in 
practice, though it is often difficult to estimate its 
shape parameter. A simple way to add anisotropy 
to such functions is to replace \\h\\ by (h T Ah) 1 / 2 , 
where A is a positive definite matrix with unit de- 
terminant; this is known as geometric anisotropy. 

If {e(x)} and {e'(x)} are two independent station- 
ary Gaussian processes with unit variance and cor- 
relation functions p(h) and p'(h), then their sum is 
also a Gaussian process, with correlation function 
p(x) + p'(x). A white noise process {e'(x)} has cor- 
relation function p(h) = 5(h), where 5(h) denotes 
the Kronecker delta function, and thus the process 
{a(l — a) l l 2 e(h) + aa l l 2 e' (h)} has variance a 2 and 
correlation function (1 — a)p(h) for h ^ 0; there is 
a so-called nugget effect at the origin, corresponding 
to the extremely local variation added by the white 
noise. In this case a proportion a of the variance 
arises from this nugget effect. 

4. LATENT VARIABLE MODELS 
4.1 General 

Dependence in many statistical settings is intro- 
duced by integration over latent variables or pro- 
cesses. Here this idea can be used to introduce spa- 
tial variation in the parameters. For example, we 
may suppose that the response variables {Y(x)} are 
independent conditionally on an unobserved latent 
process {S(x):x G X}, let the parameters of the 



response distributions depend on {S(x)}, suppose 
that {5(x)} follows a Gaussian process, and then 
induce dependence in {y(x)} by integration over 
the latent process. This approach is common in geo- 
statistics with nonnormal response variables (Dig- 
gle, Tawn and Moyeed (1998); Diggle and Ribeiro 
(2007)), and because of the complexity of the in- 
tegrations involved is most naturally performed in 
a Bayesian setting, using Markov chain Monte Carlo 
algorithms (Gilks, Richardson and Spiegelhalter 
(1996); Robert and Casella (2004)) to perform in- 
ferences. An excellent account of this approach to 
spatial modeling is provided by Banerjee, Carlin and 
Gelfand (2004). 

The first application of latent variables to sta- 
tistical extremes was the study of hurricane wind 
speeds by Coles and Casson (1998) and Casson and 
Coles (1999). They treated position on the Eastern 
seaboard of the US as a scalar spatial variable and 
used a hierarchical Bayes model with a stable corre- 
lation function to fit the point process likelihood to 
their data. In their application the main gains rel- 
ative to treating the data at different sites as inde- 
pendent were the possibility of interpolation of the 
distribution of extreme wind speeds between sites 
at which they had been observed, and an increase 
in the precision of estimation due to borrowing of 
strength. A related approach, but without spatial 
structure, was used by Fawcett and Walshaw (2006) 
to model wind speeds in central and northern Eng- 
land. 

Cooley, Nychka and Naveau (2007) used the gen- 
eralized Pareto model (4) with a common thresh- 
old u at all sites to map return levels for extreme 
rainfall in Colorado. The rate parameter A and the 
scale parameter a u depended on location x in a cli- 
mate space comprised of elevation above sea-level 
and mean precipitation, instead of longitude and lat- 
itude. A stationary isotropic exponential covariance 
function was used to induce spatial dependence in 
the latent processes {S(x)} for these parameters. 
The shape parameter £ had two values, depend- 
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ing on the site location. Turkman, Turkman and 
Pereira (2010) construct a similar but more complex 
model for space-time properties of wildfires in Por- 
tugal, using a random walk to describe the tempo- 
ral properties, and smoothing for the spatial depen- 
dence; their paper also makes suggestions on spatial 
max-stable modeling with exceedances. Gaetan and 
Grigoletto (2007) analyze annual rainfall maxima at 
sites in northeastern Italy, using nonstationary spa- 
tial dependence and random temporal trend in the 
parameters of the generalized extreme-value distri- 
bution. Sang and Gelfand (2009) modeled gridded 
annual rainfall maxima in the Cape Floristic Re- 
gion of South Africa using the generalized extreme- 
value distribution with a spatio-temporal hierarchi- 
cal structure, and in Sang and Gelfand (2010) used 
a Gaussian spatial copula model, transformed to 
the generalized extreme-value scale, to induce de- 
pendence between extremes of point-referenced rain- 
fall data. Other applications of such models to areal 
data are Cooley and Sain (2010), who assessed pos- 
sible changes in rainfall extremes by comparing cur- 
rent and future rainfall computed from a regional 
climate model, using an intrinsic autoregression to 
model how the three parameters of the point pro- 
cess formulation for extremes vary on a large grid. 
Owing to difficulties in estimating the shape param- 
eter, these authors used a penalty due to Martins 
and Stedinger (2000) to ensure that |£| < 1/2. 

In the next section we describe a rather simpler 
latent model for the annual maximum rainfall data 
used in this paper. 

4.2 A Simple Model 

Suppose that the GEV parameters {rj(x), r (x), 
£(x)} vary smoothly for x S X according to a sto- 
chastic process {S(x)}. For our application, and by 
analogy with Casson and Coles (1999), we assume 
that the Gaussian processes for each GEV parame- 
ter are mutually independent, though this assump- 
tion can be relaxed (Sang and Gelfand (2009); Coo- 
ley and Sain (2010)). For instance, we take 

(11) r}(x) = f v (x;l3 ri ) + S 71 (x;a ri ,\ ri ), 

where is a deterministic function depending on 
regression parameters /3„, and is a zero mean, 
stationary Gaussian process with covariance func- 
tion a„exp(— ll/tll/A,,) and unknown sill and range 
parameters and X^. We use similar formulations 
for t{x) and £,(x). Then conditional on the values 
of the three Gaussian processes at the sites (x±, . . . , 
xr>), the maxima are assumed to be independent 



with 

Yifad) I {v( x d),i~(xd),€(x d )} 

(12) ~GEV{n(x d ),T(x d ),Z(x d )}, 

i = 1, . . . ,n,d= 1, . . . , D. 

A joint prior density ir must be defined for the pa- 
rameters a v , q t , oj£, A^, A T , Af, (3 V , (3 T and /3g. In 
order to reduce the computational burden, we use 
conjugate priors whenever possible, taking indepen- 
dent inverse Gamma and multivariate normal dis- 
tributions for q t and (3 T , respectively. No conjugate 
prior exists for A T , for which we take a relatively 
uninformative Gamma distribution. The prior dis- 
tributions for the two remaining GEV parameters 
are defined similarly. The full conditional distribu- 
tions needed for Markov chain Monte Carlo com- 
putation of the posterior distributions are as fol- 
lows: 

vr(?7 | • • •) oc vr(?7 I a v , X v , f3 v )ir(y | r], r, £), 
ir(a v | •••) oc7r(a^ | k*^, 6**^)^(77 | a v , X v , f3 v ), 
tt(X v I ••OocTrfA,, I K* Xv ,8* x JiT(ri \ a v , Xr,,/3 V ), 

^{Pn I •••) ^^{Pr, I K^M 7 ? I OL r] ,X v ,fi T1 ) 1 

where k* , 9*, fi* and S* are the hyperparameters of 
the prior distributions. The full conditional distri- 
butions related to r and £ have similar expressions. 
The corresponding Markov chain Monte Carlo algo- 
rithm is outlined in the Appendix. 

5. COPULA MODELS 

5.1 Generalities 

In view of the flexibility of modeling afforded by 
Gaussian-based geostatistical models, and, in partic- 
ular, the range of potential covariance functions, it is 
natural to investigate how they may be extended to 
model spatial extremes. An obvious approach is to 
use the probability integral transformation to place 
the annual maxima on the Gaussian scale, on which 
their joint distribution can be modeled using stan- 
dard geostatistical tools. However, the requirement 
that the model for the original data should be max- 
stable imposes tight restrictions on the possible co- 
variance structures, even on the Gaussian scale. Al- 
though these restrictions are theoretical in nature, 
we shall see below that they strongly affect the fit 
of the models. There is a close relationship between 
this approach and the use of copulas, and we first 
give a brief outline of the latter. 
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5.2 Copulas 



5.3 Extremal Copulas 



Sklar's Theorem (Nelsen (2006), pages 17-24) es- 
tablishes that the D-dimensional joint distribution F 
of any random vector Yx,..., Yd may be written as 

(13) F(y u ...,y D ) = C{Fx(yx),. ■ -,F D (y D )}, 

where Fx, ... , Fd are the univariate marginal distri- 
butions of Xx, ■ ■ ■ ,Xd and C is a copula, that is, 
a D-dimensional distribution on [0,1]'°. The func- 
tion C is uniquely determined for distributions F 
with absolutely continuous margins. If the marginal 
distributions F^ are continuous and strictly increas- 
ing, then C corresponds to the distribution of Fx(Yx), 
■ ■ -,Fd(Yd), that is, 

C( Ul , ...,u D ) = F{F^(ux), . . .,F^(u D )}. 

Nelsen (2006) and Joe (1997) are clear introductions 
to multivariate models and copulas. 

One might argue, with Mikosch (2006), that the 
transformation to uniform margins is mathemati- 
cally trivial, obscures important features of the data 
that are visible on their original scale and makes 
stochastic modeling awkward, and hence is rarely 
interesting for applications. An alternative view is 
that the implicit separation of the marginal distri- 
butions of the variables from their dependence struc- 
ture provides a unifying framework to modeling mul- 
tivariate data. The discussion following Mikosch's 
paper may be consulted for a lively debate of the 
merits and demerits of copulas; here we merely wish 
to show how they may be used to model spatial ex- 
tremes. 

As a simple and important example, suppose that 
Yx,... , Yd have a joint Gaussian distribution with 
means zero and covariance matrix Vl whose diago- 
nal elements all equal unity. The Gaussian copula 
function is 

(u) c( Ul , ...,u D ) = <$>{<s>-\ux), . . . , $-\u D y,n} t 

where $(-;0) is the joint distribution function of 
Yx , . . . , Yd and <I> denotes the cumulative distribu- 
tion function of a standard normal random variable. 
Here we have used the componentwise transforma- 
tion Ui = &(Yi). The corresponding density is read- 
ily obtained. Similarly, the copula of the multivari- 
ate Student t distribution with v degrees of freedom 
and dispersion matrix Vt may be written 

C(ux, ■ ■ -,u D ) 

(15) 

= T I/ {T- 1 (ui),...,r- 1 (« iJ );f2}, 

where T V (-;Q) and T v are the corresponding joint 
and marginal distribution functions. 



If the random variables Yx,..., Yd possess a joint 
multivariate extreme value distribution, then their 
marginal distributions are of the form (3). As these 
margins are continuous, equation (13) implies that 
the joint distribution must correspond to a unique 
copula, and the max-stability property implies that 
this copula must satisfy 

ckv..,«E) 

= C m (ux,---,u D ), < lil, . . . ,UD < 1, ?n <EN. 

Such a copula, called an extremal copula or stable 
dependent function (Galambos (1987); Joe (1997)), 
is closely related to the exponent measure of Sec- 
tion 2.3, through the relation C(ux, ■ ■ ■ ,ud) = 
exp{— V(— 1/logui, . . . , — 1/loguo)}. The spectral 
representation (7) means that we may write 



(16) 



C(ux,...,u D ) 



= exp 




log Ml 



logU D 



D 



d=l 



Ud 



where the function A, called the the Pickands de- 
pendence function, depends on the measure M on 
the simplex Sd] A is often written as a function of 
just D — 1 of its arguments, which sum to unity. 
Since the transformation from Frechet to uniform 
margins is continuous, convergence of rescaled max- 
ima to a nondegenerate joint limiting distribution 
on the uniform scale follows from the convergence 
on the Frechet scale. A useful example is the extre- 
mal t copula (Demarta and McNeil (2005)), which 
results from rescaling the maxima of independent 
multivariate Student t variables with dispersion ma- 
trix $7 and v degrees of freedom. For D = 2 this 
yields 

w)y/ u 



A{w) = wT l 



u+l 



W(i 



p 



(17) 



+ (1 



.{(1 

w)T v+1 



p 2 )/{v + l)} l/2_ 

{(1 - w)lw) x l v 



p 



{(l-p2)/(„ + l)}V2 

<w < 1,-1 < p< 1 



where p is the correlation obtained from £1. The 
limit of (17) when the correlation may be expressed 
as p = exp{— a 2 / (2u)} ~ 1 — a 2 / (2u) for some a > 
and v — > oo is the Hiisler and Reiss (1989) copula 
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given by 



may be written as 



A(w) = (1 -w)®\ - + a _1 logf - — 
2 V w 



w 



(18) 



+ w$l ~ + a 1 lo 

2 \ 1 — w 



w 



0<w<l; 



see also Nikoloulopoulos, Joe and Li (2009). This 
implies that the extremal t copula is more flexible 
than the Hiisler-Reiss copula, in two distinct ways: 
first, the presence of the degrees of freedom intro- 
duces a further parameter; second, two different cor- 
relation functions that yield the same form for a 
when v — > oo, such as the Gaussian function p(h) = 
exp{ — (h/\) 2 /(2is)} and the Cauchy function p(h) = 
{1 + (h/X) 2 /(2iy)}~ K , will both yield the same form 
for (18) but not for (17). In the limit as v — > oo the 
parameter k must be absorbed by reparametriza- 
tion, as we shall see in Section 7.3. Owing to the 
relationship between correlation functions and var- 
iograms mentioned after (10), we see that a 2 will 
correspond to a semivariogram. 

For any fixed correlation \p\ <1, it follows from (17) 
that the limit as v — > oo is A(w) = 1, which corre- 
sponds to C(u±, U2) = u\U2, so componentwise max- 
ima of correlated normal variables are independent 
in the limit, except in the trivial case \p\ = 1. A simi- 
lar limit with a different rescaling was used by Hiisler 
and Reiss (1989) when taking maxima of m indepen- 
dent bivariate Gaussian variables with correlation p; 
in this case letting p — > 1 such that lim m _ i . 00 4(l — 
p) logm = a 2 also yields (18). 

The limit of (17) when v — > is the Marshall- 
Olkin copula 

C(ui,u 2 ) 

(19) = exp{alog(tiiti 2 ) + (1 - a) logmin(ui, u 2 )}, 

< a < 1, 

where a = Ti{— p/ (1 — p 2 ) 1 / 2 } . The boundary cases 
in (19) are a = 0, which corresponds to perfectly 
dependent extremes and arises for p = 1 , and a = 1 , 
which corresponds to independent extremes and 
arises for p = — 1 . 

5.4 Tail Dependence 

Pairwise tail dependence in copulas may be mea- 
sured using the limits of the conditional probabilities 
pr(U2 > u I U\ > u) and pr([/2 < u \ U\ < u), which 



Xup 



Xlo 



lim 



lim 

M-S>0+ 



l-2u- C(u,u) 
1 -u ' 
C(u, u) 
u 



provided that these limits exist. If one of these ex- 
pressions is positive, then there is dependence in the 
corresponding tail, and otherwise there is indepen- 
dence. If an extremal copula C* corresponding to C 
exists and is nondegenerate, that is, if 



C{u[ 



l/m 1 /m 



H> C*(u 1 ,u 2 ), 
< u\,U2 < l,m 



■ 00, 



Xup — Xlow 



2T, 



then the values of Xup for C and C* are equal (Joe 
(1997), page 178). 

In the max-stable case there is a close relation 
between Xup an d the extremal coefficient, 9, viz., 
Xup = 2 - 6 = 2 - 2,4(1/2, 1/2), where A is the de- 
pendence function in (16). In particular, the Gaus- 
sian copula has Xup = Xlow = 0, the Student t copula 
has 

>+!Ki^p)\ 1/2 l 
i + p J _' 

whose symmetry stems from the elliptical form of 
the joint densities, and the Hiisler-Reiss copula has 
Xup = 2 - 2$(a/2) and xiow = 0. 

5.5 Inference 

Given data yi,...,yD assumed to be a realiza- 
tion from a multivariate distribution whose margins 
take the parametric forms Hi(y; £),..., Hr>{y; £) and 
which has a parametric copula C that depends upon 
parameters 7, the parameter vector •& = (C 5 7) may 
be estimated by forming a likelihood from the joint 
density corresponding to the joint distribution 
C{Hi(yi; C),..., H D (y D ;();-f}. In the spatial con- 
text the Hd will typically depend on the site Xd at 
which yd is observed, as in (12), and 7 will repre- 
sent the parameters of a function that controls how 
the dependence of y c and yd is related to the dis- 
tance between them. For example, when fitting the 
Student t copula, the (c, d) element of the dispersion 
matrix $7 could be of the form a 2 p(x c — Xd), where p 
is one of the correlation functions of Section 3.2. 

If the joint density of Y\, . . . , Yd is available, then 
likelihood inference may be performed in the usual 
way, with the observed information matrix used to 
provide standard errors for estimates based on large 
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samples, and information criteria used to compare 
competing models. Alternatively, Bayesian inference 
can be performed; for example, Sang and Gelfand 
(2010) use Markov chain Monte Carlo to fit such 
a model, with the Gaussian copula, exponential cor- 
relation function and GEV marginal distributions 
having the same scale and shape parameters but 
a regression structure and spatial random effects 
in the location parameter. Unfortunately the joint 
density of Y\ , . . . , Yd is not available when using the 
Hiisler-Reiss and extremal t copulas, for which only 
the bivariate distributions corresponding to (17) 
and (18) are known. In Section 6.2 we discuss the use 
of composite likelihood for inference in such cases. 

6. MAX-STABLE MODELS 
6.1 Models 

It is natural to ask whether there are useful spa- 
tial extensions of the extremal models described in 
Section 2. The central arguments of Section 2.2 were 
extended to the process setting by Laurens de Haan 
around three decades ago, and a detailed account is 
given by de Haan and Ferreira (2006), Chapter 9. 
A key notion is that of a so-called spectral repre- 
sentation of extremal processes, and for our pur- 
poses the most useful such representation is due 
to Schlather (2002). Let {S' 1 }^ be the points of 
a homogeneous Poisson process of unit rate on M + , 
so that {Sj}°?L 1 are the points of a Poisson process 
on M + with intensity ds/s 2 , and let {Wj(x)}JL x be 
independent replicates of a stationary process W(x) 
on M p satisfying E[max{0, Wj(o)}] = 1, where o de- 
notes the origin. Then 

(20) Z(x) = max Sj max{0, Wj(x)} 

3 

is a stationary max-stable process on MP with unit 
Frechet marginal distributions. To see this, note fol- 
lowing Smith (1990) that we can consider the 
{Sj,Wj{x)}JL l to be the points in a Poisson pro- 
cess of intensity ds/s 2 x v(dw) on R + x W, where v 
is the measure of the Wj(x) and W is a suitable 
space. Thus, the probability that Z(x) < z equals 
the void probability of the set {(s,w) G M + x W : 
smax(0,w) > z}, which has measure 

ds f 

— v(dw) = J z^ 1 max{0, w}v(dw) 

z/ max{0,ui} ° J 



because E[max{0, W}(o)}] = 1; hence, Z(x) has a unit 
Frechet distribution. The max-stability follows from 
the infinite divisibility of the Poisson process, which 



implies that the distributions of {maxj=i m Zj(x\), 
■ ■■,maXj=i,...,mZj(%D)} and m{Z(xi), . . . , Z(x D )} 
are equal for any finite subset of points {xi,..., 
x D } C X. 

Different choices for the process W{x) lead to some 
useful max-stable models. Stationarity implies that 
if we wish to describe the joint distributions of the 
max-stable process {Z(x)} at pairs of points of X, 
then there is no loss of generality in considering the 
sites o and h, and for the remainder of this sub- 
section we describe the joint distributions of Z(o) 
and Z(h) under some simple models. 

A first possibility is to take Wj{x) = g(x — Xj), 
where g is a probability density function and {Xj} 
is a homogeneous Poisson process, both on MP. In 
this case the value of the max-stable process at x 
may be interpreted as the maximum over an infinite 
number of storms, centered at the random points Xj 
and of ferocities Sj, whose effects at x are given by 
Sjg(x — Xj). The case where g is the normal den- 
sity was considered by Smith (1990) in a pioneer- 
ing unpublished report and is often called the Smith 
model. If g is taken to be the multivariate normal 
distribution with covariance matrix f2, then the ex- 
ponent measure for Z{6) and Z(h) is 



(21) 



where a 2 (h) = /i T S7 _1 /i is the Mahalanobis distance 
between h and the origin, and <5 is the standard 
normal distribution function. The close resemblance 
to (18) is no coincidence; this corresponds to tak- 
ing an exponential correlation function from Table 1 
with geometric anisotropy and letting the scale pa- 
rameter A — > oo, thereby producing the extremal 
model for an intrinsically stationary underlying 
Gaussian process with semi-variogram proportional 
to /i T r2 _1 /i. The extremal coefficient is the 9(h) = 
2&{a(h)/2}, which attains 2 as h — > oo and falls to 1 
as h — > 0, spanning the range of possible extremal 
dependencies. The exponent measures for the Stu- 
dent and Laplace densities were derived by de Haan 
and Pereira (2006) but are appreciably more com- 
plicated and do not seem to have been used in ap- 
plications. 

A second possibility is to take the {Wj(x)} to be 
stationary standard Gaussian processes with corre- 
lation function p(h), scaled so that E[max{0, 
W}(o)}] = 1. Schlather (2002) shows that in this case 
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the exponent measure for Z(o) and Z(h) is 
1/1 1 



V(Z!,Z2) 



(22) 



+ 

2\zi z 2 



1 + 



1 - 2 



{p(h) + l} Zl z 2 



1/2 



This, the so-called Schlather model, is appealing be- 
cause it allows the use of the rich variety of corre- 
lation functions in the geostatistical literature, as 
sketched in Section 3.2, but unfortunately the re- 
quirement that p(h) be a positive definite function 
imposes constraints on the extremal coefficient 6(h) = 
1 + [{1 - p(h)}/2} 1 / 2 . When heM. 2 and the Wj(x) 
are stationary and isotropic, it turns out that 6(h) < 
1.838, so this model cannot account for extremes 
that become independent when the distance h in- 
creases indefinitely. 

A third possibility stems from noting that if Wj (x) 
is stationary on W, satisfies the properties above (20). 
and is independent of the compact random set Bj 
with indicator function Ibj(x) and volume \B\, and 
if Xj is a point from a Poisson process on M. p with 
rate E(\B\) T 1 , then 

Wf(x) = W j (x)I B] (x-X j ) 

is also stationary on W and may be used as the basis 
of a max-stable process. The exponent measure (22) 
generalizes to 

V( Zl ,z 2 ) 

1 1 

Zl z 2 



(23) 



a(h) 



1 



1-2 



{p(h) + l} Zl z 2 



1/2 



(zi+z 2 ) 2 

where a(h) = E{\B n (h + B)|}/E(|B|) G [0,1] de- 
pends on the geometry of the random set; if h is 
large enough that the mean overlap of B and h + B 
is empty, then the corresponding extremes are inde- 
pendent. Davison and Gholamrezaee (2012) fit mod- 
els based on (22) and (23) to extreme temperature 
data. 

A fourth possibility is to let W(x) = exp{cre(3;) — 
er 2 /2}, a > 0, where e(x) is a stationary standard 
Gaussian process with correlation function p(h). In 
this case the exponent measure for Z(o) and Z(h) 
equals (21), with a 2 (h) = 2a 2 {1 - p(h)}. Hence, the 
extremal coefficient may be written 6(h) = 23>[<r{l — 
p(/i)} 1/2 /V2}. As a ->■ or p -> 1, 9 ->■ 1, while as 



a — > oo, 0—^2 for any p. Thus, this geometric Gaus- 
sian process, so-called, can have both independent 
and fully dependent max-stable processes as limits, 
but has the same exponent measure as the Smith 
model. 

This process can be generalized by taking W(x) = 
exp{e(x) — j(x)}, where e(x) denotes an intrinsically 
Gaussian process with semivariogram ^(h) and with 
e(o) = almost surely, thus ensuring that cr 2 (h) = 
var{e(/i)} = 2^(h) and giving extremal coefficient 
0(h) = 2$[{ 7 (/i)/2} 1 / 2 ]. As j(h) -> 0, we have 6(h) 
1, while if j(h) is unbounded, then 6(h) — > 2 as 
\\h\\ — > oo. Brown-Resnick processes (Davis and 
Resnick (1984); Kabluchko, Schlather and de Haan 
(2009)) appear when e is a fractional Brownian pro- 
cess, that is, 7(/i) oc h a , < a < 2, h > 0. In particu- 
lar, when e is a Brownian process, a = 2, the process 
corresponds to the Smith model, which also arises 
as a Hiisler-Reiss model under the limiting con- 
straint lim ra _ s . 0O 4{1 — p(h)} logn = a(h) 2 . On equat- 
ing the extremal coefficients for the Brown-Resnick 
and Hiisler-Reiss models, a(h)/2 = {j(h)/2} 1 / 2 , we 
can obtain equivalences between their parameters. 
For example, under the assumption of a stable cor- 
relation function, we obtain Ahr = 2 -1 / Khr /i(Abr/ 
/i) Kbr / Khr , in an obvious notation, and thus if khr = 
kbr, then Ahr = 2 -1 / Khr Abr. On comparing the es- 
timates in Tables 4 and 5, we see that this relation 
holds. 

6.2 Pairwise Likelihood Fitting 

The fitting of max-stable processes to data is key 
to applying them. By far the most widely-used ap- 
proaches to fitting are based on the likelihood func- 
tion, either as an ingredient in Bayesian inference, or 
by maximum likelihood. Both require the joint den- 
sity of the observed responses, but as we see from 
Sections 2.3 and 6.1, this appears to be generally 
unavailable for max-stable process models. Only the 
pairwise marginal distributions are known for most 
models, and even if an analytical form of the full 
joint distribution exp{— V(zi, . . . , zjj)} were avail- 
able, it would be computationally infeasible to ob- 
tain the density function from it unless D was small. 
In such circumstances it seems natural to base in- 
ference on the marginal pairwise densities. 

Suppose that the available data may be divided 
into independent subsets 3^1, • ■ • ,34- In the applica- 
tion described above, n would often represent the 
number of years of data, and for a complete data 
set 3^ would represent the maxima at the D sites 
available for each year. Provided that the param- 
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eters $ of the model may be identified from the 
pairwise marginal densities, they may be estimated 
by maximizing a composite log likelihood function 
of the form (Lindsay (1988); Cox and Reid (2004); 
Varin (2008)) 

n 

W = E Yl iog/(y„y fe ;tf). 

i=l {j<k: yj ,y k £yi} 

The variance matrix of the maximum composite like- 
lihood estimator i9 may be estimated by an informa- 
tion sandwich of the form V0) = J" 1 (-&)K(S) J" 1 0) , 
where </($) is the observed information matrix, that 
is, the hessian matrix of — ipifl), and K{$) is the 
estimated variance of the score contributions, corre- 
sponding to the composite log likelihood £ p . Below 
we estimated the latter using centered sums of score 
contributions, in order to reduce the bias of the es- 
timated matrix. 

It is not always straightforward to maximize a com- 
posite log likelihood, and in the applications below 
we used multiple starting points in order to find the 
global maximum. 

Model selection is effected by minimization of the 
composite likelihood information criterion CLIC = 
-2£ p $) + 2tr{J- 1 {$)K({})} (Varin and Vidoni, 
2005), which has properties analogous to those of 
AIC and TIC (Akaike (1973); Takeuchi (1976)). 

Composite likelihood is increasingly used in prob- 
lems where the full likelihood is unobtainable or 
too burdensome for ready computation, and there 
is a burgeoning literature on the topic, summarized 
by Varin (2008). Padoan, Ribatet and Sisson (2010), 
Blanchet and Davison (2011) and Davison and Gho- 
lamrezaee (2012) discuss its application in the con- 
text of extremal inference, and its use to fit spatial 
extremal models based on (21) and (22) has been im- 
plemented in the R libraries SpatialExtremes and 
CompRandFld. See also Smith and Stephenson (2009) 
and Ribatet, Cooley and Davison (2012), who use 
Bayes' theorem and pairwise likelihood to fit ex- 
tremal models to rainfall data. 

Alternative estimators of parameters for pairs of 
sites have been suggested by de Haan and Pereira 
(2006) and de Haan and Zhou (2008), and applied 
by Buishand, de Haan and Zhou (2008). 

7. RAINFALL DATA ANALYSIS 

7.1 Preliminaries 

We illustrate the above discussion using the an- 
nual maximum rainfall data described in Section 1. 
The focus in this paper is on comparison of different 



spatial approaches to modeling the maxima, so we 
fitted the generalized extreme value distribution (3) 
in all cases, using marginal parameters described by 
the trend surfaces 

(24) rj(x) = /3 ,r, + fa. tV lon(x) + fo,T, lat(x) , 

(25) t(x) = /3 ,t + Pi,r lon(x) + /5 2 , T lat(x), 

(26) £(:c) = Au, 

where lon(x) and lat(x) are the longitude and lati- 
tude of the stations at which the data are observed. 
The marginal structure (24)-(26) was chosen using 
the CLIC and likelihood values obtained when fit- 
ting a wide range of plausible models. Experiments 
with fitting of flexible spatial surfaces, such as thin 
plate splines, have shown little benefit of doing so in 
this particular case, and raise problems such as the 
choice of knot locations and of penalty. We there- 
fore decided not to include such terms in the base- 
line model. Other approaches to spatial smoothing 
might also be adopted, as in Butler et al. (2007), who 
use local likelihood estimation for extreme-value 
models (Davison and Ramesh (2000); Hall and Taj- 
vidi (2000)), but they do not seem necessary here. 
Smoothing for extremes is also discussed by Pauli 
and Coles (2001), Chavez-Demoulin and Davison 
(2005), Laurini and Pauli (2009) and Padoan and 
Wand (2008), and might be essential over larger spa- 
tial domains. 

A referee suggested taking r(x) cx rj(x), as is some- 
times used in hydrological applications, but though 
this yields a slightly more parsimonious marginal 
model that fits about equally well as judged using 
CLIC based on an independence log likelihood, we 
decided to stick with the more general form (24)-(26) 

For each correlation function used below, we let A 
denote the scale parameter, and let k and a de- 
note further parameters, depending on the correla- 
tion function, that determine the smoothness of the 
random field. 

To compare the different model fits, we show real- 
izations of the corresponding annual maximum rain- 
fall surfaces, and compare the empirical distribu- 
tions of maxima for subsets of the 16 validation 
stations with those simulated from the fitted mod- 
els. The simulations for the max-stable and extremal 
copula models were performed using the expres- 
sions (20) for large finite numbers of points of the 

Poisson process, and C m {u{ , u 1 ^" 1 ) for large m; 
in both cases we verified that the marginal distribu- 
tions were indistinguishable from their theoretical 
limits. The Brown-Resnick process was simulated 
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Table 2 

Hyperparameters on the latent process used for the rainfall 
application. The prior distributions for a and A are 
respectively inverse Gamma and Gamma 





a 




A 




Shape 


Scale 


Shape 


Scale 




1 


12 


5 


3 


t(x) 


1 


1 


5 


3 




1 


0.04 


5 


3 



using ideas of Oesting, Kabluchko and Schlather 
(2012). 

For reasons of space we confine the discussion be- 
low to summer maximum rainfall, but the same con- 
clusions hold for winter maxima, except that the 
estimated extremal coefficients are slightly higher, 
indicating marginally lower spatial dependence, in 
line with the difference between the weather pat- 
terns leading to heavy rainfall in summer and win- 
ter months; see the center and lower sets of panels 
in Figure 2. 

7.2 Latent Variable Model 

We first describe the results from the latent vari- 
able approach. In order to compare the results on 
a roughly equal footing, the model considered has 
the same trend surfaces for the marginal parame- 
ters as in expressions (24)-(26), with the addition 
of three independent zero mean Gaussian random 
fields S n (x), S T (x) and S%(x), as in (11), each with 
an exponential correlation function. Proper normal 
priors with very large variances were assumed for the 
regression parameters f3 appearing in (24)-(26). As 
suggested by Banerjee, Carlin and Gelfand (2004), 
informative priors should be used for the parame- 
ters a and A of the covariance functions, in order 
to yield nondegenerate marginal posterior distribu- 
tions for them. Suitable prior densities were chosen 



after exploratory analysis of the fitted marginal dis- 
tributions and are summarized in Table 2; they pro- 
vide proper prior densities with means similar to the 
average marginal maximum likelihood estimates but 
much larger variances. A summary of the posterior 
is given in Table 3. These results were obtained after 
300,000 iterations of the Markov chain, thinned by 
a factor 30, preceded by a burn- in of 5000 iterations. 

The variation of r]{x) with latitude and longitude 
seems reasonable, with the decrease as latitude in- 
creases and longitude decreases corresponding to 
a general reduction in altitude away from the Alps. 
The pattern of variation for the scale parameter is 
similar. Similar to other data sets on extreme rain- 
fall, the shape parameter is positive, corresponding 
to the heavy-tailed Frechet case, but not strongly 
so. In accordance with other authors (Zhang (2004); 
Sang and Gelfand (2010)), we found that it was 
not possible to learn from the data simultaneously 
about the parameters a and A, for which there is 
an identifiability problem. As a result, the poste- 
rior distributions for A are close to the chosen prior 
Gamma(5,3). A sensitivity analysis on the choice of 
this prior was performed and, although the posterior 
distributions for a and A were different, the predic- 
tive pointwise return level maps shown in Figure 3 
were similar. 

Figure 3 shows maps of the predictive pointwise 
posterior mean for the 25-year return level, with 
pointwise 95% credible intervals. These maps were 
produced by first generating one conditional simu- 
lation of three independent Gaussian processes for 
each state of the Markov chain given its then-current 
values oi rj, t and £, and then using this realization 
to compute pointwise 25-year return levels at un- 
gauged sites. This shows the main strength of the 
latent variable approach: the use of stochastic pro- 
cesses to model the spatial behavior of the marginal 
parameters enables us to capture complex local vari- 
ation in the return levels that deterministic trend 



Table 3 

Summary statistics for the posterior distributions of the latent process parameters. The posterior means and the associated 
95% credible intervals (parentheses) are displayed. h+ = — A log 0.05 corresponds to the distance for which the correlation 

function equals 0.05. The parameter /3o,J is dimensionless 





[3 (mm) 


/3i (mm/km Ion) 


(3? (mm/km 


lat) 


a 


A (km) 


/i+ (km) 


T)(x) 


26 (24,29) 


0.05 (-0.02,0.13) 


-0.16 (-0.23, 


-0.10) 


5 (2,12) 


22 (9,38) 


64 (28,114) 


t(x) 


9 (8.2,9.8) 


5 (-26,37) x 10~ 3 


-0.04 (-0.06,- 


-0.01) 


0.58 (0.18,1.6) 


17 (6, 34) 


51 (17,101) 




0.16 (0.06,0.27) 








9 (4,20) x 10" 3 


22 (8,42) 


67 (25, 125) 
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Fig. 3. Maps of the (predictive) pointwise 25-year return level estimates for rainfall (mm) obtained from the latent variable 
and max-stable models. The top and bottom rows show the lower and upper bounds of the 95% pointwise credible/confidence 
intervals. The middle row shows the predictive pointwise posterior mean and pointwise estimates. The left column corresponds 
to the latent variable model assuming Gamma(5,3) prior on X. The middle column assumes the less informative priors 
X v ~ Gamma(l, 100) , A T ~ Gamma(l, 10) and X^ ~ Gamma(l, 10) . The right column corresponds to the extremal t copula 
model. 



surfaces cannot reproduce. The simulation output 
can be manipulated to obtain posterior standard er- 
rors and other uncertainty measures for quantities 
of interest, such as these or other return levels. 

Although the pointwise return level maps look 
reasonable, the latent variable approach does not 
provide plausible spatial process realizations. The 
upper left panel of Figure 4 shows one realization 
of the spatial process from this model. Clearly, the 
assumption of conditional independence given the 
latent process leads to unrealistic spatial structure, 
and this has a severe impact when using this model 
to analyze the multivariate distribution of extremes 



for several sites, or for regional analysis. Compared 
to the other models, the conditional independence 
assumption underlying the latent variable model 
leads to much less variation in quantities such as 
the statistic used to choose the simulations shown, 
that is, T = \13\~ 1 J xejg Z(x), where B denotes a ball 
of radius 10 km centered on Zurich. 

Figure 5 confirms this through QQ-plots for dif- 
ferent groupwise maxima. The multivariate distribu- 
tion of the validation sample is very poorly modeled, 
because the conditional independence assumption is 
not appropriate for extreme rainfall events involving 
dependence between stations. For instance, when 
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Fig. 4. One realization from each of the models. From left to right, the top row shows results from the latent variable. 
Student t copula, Hiisler-Reiss copula and extremal-t copula models; the bottom row shows results from the Smith, Schlather, 
geometric Gaussian and Brown-Resnick models. The extreme top and bottom panels show histograms of 1000 realizations of 
the summary statistic T, and the vertical lines correspond to the realizations shown. 



groups of maxima are considered, the latent variable 
model seems to systematically overestimate their 
joint distribution, by an amount that depends on 
the number of sites contributing to the maximum. 

7.3 Copula Models 

In this section we describe the results obtained 
from fitting the copula models. We fit the nonex- 
tremal Gaussian and Student t copulas using the 
full likelihood, and the extremal copulas using max- 
imum pairwise likelihood estimation. In each case 
we use the marginal structures (24)~(26) and the 
correlation functions in Table 1. 

We first fitted the Gaussian and Student t copu- 
las (14) and (15) with GEV marginal distributions 
and various correlation functions, using the corre- 
sponding likelihoods. These copulas are not max- 
stable, so we do not expect this approach to yield 



good models for the joint extremes; this is essen- 
tially a frequentist approach to fitting models like 
that of Sang and Gelfand (2010). The left panel of 
Figure 6 shows the empirical semivariogram for the 
fitting and validation stations, with the fitted semi- 
variograms from the best and worst-fitting mod- 
els obtained using this approach. The Student t fit 
seems reasonable, though not ideal, but the center 
and right panels show that the corresponding ex- 
tremal coefficients do not match to the data; the 
extremal coefficient for the Gaussian copula equals 
2 at all distances h, and that for the Student t cop- 
ula predicts very weak extremal dependence incon- 
sistent with the observed extremes. 

Turning to extremal copulas, Table 4 shows that 
the extremal t models all fit the data appreciably 
better than do the Hiisler-Reiss models, with well- 
determined but small estimates of the degrees of 
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Fig. 5. Model checking for the latent variable model. The top row compares pairwise maxima simulated from the model and 
the observed maxima for pairs of stations separated by 7 km (left), 45 km (middle) and 83 km (right). The middle row compares 
the observed and predicted minima (left), mean (middle) and maxima (right) for a group of five stations chosen randomly. 
The bottom row compares the observed and predicted minima (left), mean (middle) and maxima (right) for all 16 stations 
kept for model validation. Overall 95% confidence envelopes are also shown. For clarity the values are transformed to the unit 
Gumbel scale using the probability integral transform for the fitted GEV model for each station. 



freedom. As in more standard geostatistical applica- 
tions, it is difficult to estimate the scale and shape 
parameters of the correlation functions, and this is 
compounded by the presence of the degrees of free- 
dom for the extremal t models; the standard errors 
for A and k can be large and somewhat variable. At 
first sight the differences in the estimates of A in the 
upper and lower parts of the table are surprising, 
but they are clarified by noting that the limit (18) 
obtained by letting v — > oo in (17) implies that for 
large v, {\\h\\/\) K ~ 2v{\\h\\/\'Y , where the param- 
eters A', are those of the extremal t model and 
those without the primes are those of the Hiisler- 
Reiss model. We therefore expect that k' « K and 
A' Rj \(2v) 1 ■> and this is indeed the case, apart from 
estimation error. Perhaps not surprisingly for rain- 
fall data, which tend to have high local variation cor- 
responding to rough spatial processes, the estimates 
of the shape parameters n are less than unity. 



To aid the comparison of these models, we intro- 
duce an extremal practical range. In conventional 
geostatistics with stationary isotropic correlation, 
the practical range is the distance h for which the 
correlation function p(h) = 0.05. In the extremal con- 
text we instead use the distances h- and h+ sat- 
isfying 9{h-) = 1.3 and 0(h+) = 1.7. Table 4 sug- 
gests that these distances are more stable than the 
parameters of the correlation functions themselves, 
though those for the exponential and Cauchy func- 
tions, which provide the worst fits, indicate stronger 
dependence of extremal rainfall. Overall inclusion of 
the degrees of freedom has a large impact on the 
model fit, while the effect of varying the correla- 
tion function is more limited. The extremal t model 
with the Whittle-Matern correlation function pro- 
vides the minimum CLIC, consistent with the best 
fit obtained with max-stable models below, from the 
geometric Gaussian process. 
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Fig. 6. Comparison between data and fitted copula models. The left panel shows the empirical semivariogram values for the 
pairs of stations used in the fitting (grey) and the validation stations (black), with the fitted semivariograms for the best (red) 
and worst (green) models. The center and right panels show F-madogram estimates of the pairwise extremal coefficients for 
the fitting and validation stations, and the fitted extremal coefficient functions for the copula models with the lowest CLIC 
(red line) and the highest CLIC (green line). The horizontal dashed lines in the center and right panels are at 1.3 and 1.7; 
these panels also show the extremal coefficient curves (black) for the models in the left panel. The center and right panels also 
show the extremal coefficients corresponding to the best-fitting nonextremal Gaussian and Student t copula models; that for 
the Gaussian model takes a constant value 2, and that for the t model lies well above the empirical extremal coefficients. 



The center and right panels of Figure 6 compare 
the F-madogram estimates of the extremal coeffi- 
cients between pairs of stations with the extremal 
coefficient functions obtained with the fitted Hiisler- 
Reiss and extremal t models that have the largest 
and smallest CLIC values. The interpretation of 
such plots is somewhat awkward because the F- 
madogram estimates do not correspond to indepen- 
dent pairs of stations, but both fits appear to un- 



derestimate extremal dependence at distances un- 
der 30 km, and to provide better fits, at least to the 
grey points, at longer distances. 

The rightmost three top panels in Figure 4, which 
show one realization from each of the Student t 
and best Hiisler-Reiss and extremal t copula mod- 
els, show that these processes provide more realis- 
tic spatial dependence than does the latent process, 
though the Student t realization gives a smaller area 



Table 4 

Fits of extremal t and Hiisler-Reiss copula models to Swiss rainfall data. The first column reports the correlation function 
used, and the second to fourth columns give parameter estimates (standard errors); DoF is the estimated degrees of freedom. 
A is the scale parameter and k is the shape parameter. (*) denotes that the parameter is held fixed, h- and h+ are the 
estimated distances at which 0(h) equals 1.3 and 1.7. NoP is the number of parameters, £ p is the maximized composite 

log-likelihood, and CLIC is the information criterion 



Correlation 








Extremal t 










DoF 


A (km) 




h- (km) 


h+ (km) 


NoP 




CLIC 


Whittle 


5.5 (2.1) 


316 (235) 


0.39 (0.05) 


6.9 


87 


10 


-210,232 


423,107 


Stable 


5.5 (2.1) 


279 (206) 


0.81 (0.09) 


6.9 


88 


10 


-210,233 


423,110 


Exponential 


4.8 (1.5) 


160 (62) 


1.00 (*) 


9.0 


72 


9 


-210,264 


423,131 


Cauchy 


5.5 (2.1) 


6.3 (1.2) 


0.06 (0.03) 


7.6 


217 


10 


-210,296 


423,230 










Hiisler Reiss 








Semivariogram 




A (km) 


K 


h- (km) 


h+ (km) 


NoP 




CLIC 



Stable 
Exponential 



11.8 (3.4) 0.74 (0.07) 5.8 84 9 -210,348 423,232 

14.6 (3.2) 1.00 (*) 8.7 63 8 -210,438 423,338 




Fig. 7. Model checking for the extremal t model with the Whittle-Matern correlation function. For details, see the caption 
to Figure 5. 



with really large precipitation, consistent with Fig- 
ure 6. 

Figure 7 shows the outcome of the model checking 
procedure for extremal t models with the Whittle- 
Matern correlation function, using the validation sta- 
tions. Overall the fit seems much better than for the 
latent variable model. For comparison, Figure 8 dis- 
plays the results of the model checking procedure 
for the Student t copula model with the Whittle- 
Matern correlation function. Although the fit is ap- 
preciably better than for the latent variable model, 
the systematic appearance of the observed minima 
above the diagonal and of the observed maxima be- 
low the diagonal suggest that the model does not 
include enough dependence in the extremes, as one 
anticipates from the rapidly decreasing extremal de- 
pendence for this model, shown in the right panel of 
Figure 6. Overall the fit is not as good as that of the 
extremal t copula, shown in Figure 7. 

A map of the pointwise 25-year return levels for 
this model is very similar to the corresponding plot 
for the max-stable models, shown in Figure 3; both 
are less plausible than the corresponding map for 



the latent variable model, which shows better adap- 
tation to local variation, though at the cost of more 
uncertainty for quantile estimates. 

7.4 Max-Stable Models 

In this section we focus on the max-stable models, 
again fitted with the marginal trend surfaces (24)- 
(26). Table 5 summarizes the fitted models. The 
Brown-Resnick and the geometric Gaussian models 
have the smallest CLIC values, perhaps owing to 
the behavior of their extremal coefficients for large 
distances. The variance parameter a 2 in the geo- 
metric Gaussian model controls the upper bound 
of the extremal coefficient function, for instance, 
for an isotropic correlation function in M. 2 9(h) < 
2$(0.838cr), for all h > 0. Hence, this model allows 
extremal coefficients 9(h) ~ 2 if a 2 is large enough. 
The Brown-Resnick model with variogram "f(h) = 
\h\ a , < a < 2, also allows 0(h) — > 2 when h — > +oo, 
because then j(h) — > +oo. These differ from the 
Schlather model, which imposes 9(h) ->- 1 + 1/2 1 / 2 
as h—>oo. See Figure 9. 




Fig. 8. Model checking for the Student t copula model with the Whittle-Matern correlation function. For details, see the 
caption to Figure 5. 



Isotropic and anisotropic Smith models were also 
considered. Their CLIC values show that the aniso- 
tropic model is better, but both fit much less well 
than the other models. This might be explained by 
the lack of flexibility of this model, which assumes 
a deterministic shape for the storms and leads to 
dependence of the extremal coefficient on the Ma- 
halanobis distance rather than on a more flexible 
function of distance; it corresponds to taking the 
Brown-Resnick model with variogram 7(^1) <x h 2 . 

Apart from the Smith models, all give comparable 
estimates for h-, though the choice of the correla- 
tion function may have a large impact on the es- 
timation of In particular, the Cauchy function 
differs greatly from the others. The best-fitting mod- 
els show values for h+ similar to those from the best 
extremal copula models, though the copula models 
have lower CLIC values. 

The geometric Gaussian model with Whittle-Ma- 
tern or stable correlation functions and the Brown- 
Resnick model appear to provide the best fits to our 
data, though we had difficulties in simultaneously 



estimating a 2 , A and K for the former models. In 
accordance with our results for the latent variable 
model, these parameters seem not to be jointly iden- 
tifiable (Zhang (2004)), perhaps because of the up- 
per limit of around 90 km on the distances between 
sites, which means that a 2 is difficult to estimate 
from these data. The safest strategy when using the 
geometric Gaussian model appears to be to fix one of 
these parameters, preferably the range A or shape k, 
which do not determine an upper bound for the ex- 
tremal coefficient. Some numerical experimentation 
shows that a 2 and A are strongly related: completely 
different values of them can lead to indistinguish- 
able extremal coefficient functions, at least for the 
distances seen in our data. 

Figure 10 shows the fits of the best max-stable 
model to the data from the validation stations. Pair- 
wise dependencies seem to be well estimated what- 
ever the distance between two sites, and the higher- 
dimensional properties also seem to be accurately 
modeled, even if different summary statistics are 
considered. 
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Table 5 

Summary of the max-stable models fitted to the Swiss rainfall data. Standard errors are in parentheses. (*) denotes that the 
parameter was held fixed, h- and h+ are, respectively, the distances for which 8(h) is equal to 1.3 and 1.7. NoP is the 
number of parameters. £ p is the maximized composite log-likelihood and CLIC is the corresponding information criterion 











Smith 










Correlation 


(Tii (km) 


(Ti2 (km) 


CT22 (km) 


h- (km) 


/i+ (km) 


NoP 




CLIC 


Isotropic 


259 (45) 


(*) 


022 = On 


12.4 


33 


8 


-212,455 


427,113 


Anisotropic 


251 (46) 


64 (13) 


290 (50) 


6.6-11.1 


18-30 


10 


-212,395 


427,020 










Schlather 










Correlation 




A (km) 




h- (km) 


h+ (km) 


NoP 


l v 


CLIC 


Whittle 




39.3 (21.4) 


0.44 (0.12) 


6.0 


147 


9 


-210,813 


424,200 


Stable 




34.8 (11.5) 


0.95 (0.16) 


6.3 


146 


9 


-210,815 


424,206 


Exponential 




34.1 (9.0) 


1.00 (*) 


(i.8 


134 


8 


-210,816 


424,167 


Cauchy 




8.0 (2.2) 


0.34 (0.16) 


7.1 


2370 


9 


-210,874 


424,321 








Geometric Gaussian 










Correlation 


a 2 


A (km) 




h- (km) 


/i+ (km) 


NoP 




CLIC 


Whittle 


11.1 (3.8) 


700 (*) 


0.37 (0.03) 


5.8 


86 


9 


-210,349 


423,232 


Stable 


15.0 (5.4) 


1000 (*) 


0.76 (0.06) 


5.9 


86 


9 


-210,349 


423,233 


Exponential 


2.42 (0.93) 


53.2 (18.4) 


1.00 (*) 


7.0 


116 


9 


-210,368 


423,271 


Cauchy 


30.9 (8.1) 


5.2 (0.66) 


0.01 (*) 


6.7 


192 


9 


-210,412 


423,355 










Brown-Resnick 










Variogram 




A (km) 


a 


h- (km) 


h+ (km) 


NoP 




CLIC 


Fractional 




30 (9.23) 


0.74 (0.07) 


5.8 


84 


9 


-210,348 


423,231 


Brownian 




29 (6.36) 


1.00 (*) 


8.7 


63 


8 


-210,438 


423,338 



Figure 4, which plots one realization from the best 
Smith, Schlather, geometric Gaussian and Brown- 
Resnick max-stable models, illustrates the differences 
among them. The elliptical forms in the Smith model 
realization seem unrealistic, while the Schlather, ge- 



ometric Gaussian and Brown-Resnick model real- 
izations appear more plausible. The difference be- 
tween those from the last three models is less ob- 
vious visually, though the geometric Gaussian and 
Brown-Resnick models tend to give less dependence 
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Fig. 10. Model checking for the Brown-Resnick model. For details, see the caption to Figure 5. 



at long ranges than does the Schlather model, ow- 
ing to the restrictions that the latter imposes on the 
extremal coefficient. 

The drawback of the max-stable process is that it 
may be difficult to find accurate trend surfaces for 
the marginal parameters. This may result in unre- 
alistically smooth pointwise return levels, similar to 
that shown in Figure 3. 

8. DISCUSSION 

If the purpose of spatial analysis of extremes is 
simply to map marginal return levels for the under- 
lying process, a very simple approach is to apply 
kriging to quantiles estimated separately for each 
site. The strong asymmetry in the uncertainty sug- 
gests that this is best applied to transformed esti- 
mates, perhaps their logarithms, followed by back- 
transformation to the original scale. The obvious 
disadvantages of this approach are that maps for 
different quantiles may be contradictory, that their 
uncertainties may be hard to assess, and that the 
resulting maps may be inconsistent with risk assess- 
ment for more complex events. 



Turning to the approaches discussed in detail 
above, a major asset of latent variable models is 
flexibility: it is conceptually straightforward to add 
further elements or other layers of variation, if they 
are thought to be necessary, though the computa- 
tions become more challenging. Moreover, the use 
of stochastic processes for the spatial distribution 
of the GEV parameters enables the treatment of 
situations for which these parameters display com- 
plex variation. Prediction at unobserved sites x+ is 
also straightforward using conditional simulation of 
Gaussian random fields for each state of the chain, 
from which observations can be generated at each x+ , 
and it is straightforward to obtain measures of un- 
certainty for quantities of interest. 

Apart from generic issues related to the choice of 
prior distribution in Bayesian inference, there are 
two main drawbacks to the latent variable approach 
in the present context. The first is that after the 
averaging over the underlying process {5(x)}, the 
marginal distribution of {y(x)} is not of extreme- 
value form, and therefore will not be max-stable. 
This contradicts the argument leading to (3), but 
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might be regarded as the price to be paid for the flex- 
ibility of including latent variables and fully Bayesian 
inference; see, for example, Turkman, Turkman and 
Pereira (2010). The second drawback is more seri- 
ous, and stems from the construction of the model: 
conditional on the underlying process, extremes will 
arise independently at adjacent sites. This is clearly 
unrealistic, and seems to undermine the use of this 
approach to forecasting for specific events, though 
it may still be very useful for the computation of 
marginal properties of extremal distributions, such 
as return levels. The copula-based approach of Sang 
and Gelfand (2010) is intended to deal with this, but 
results in Section 7.3 suggest that a closely-related 
frequentist copula model does not adequately ex- 
plain the local extremal dependence of our annual 
maximum rainfall data, so the use of Gaussian copu- 
las cannot be regarded as wholly satisfactory. A more 
promising approach has been suggested in the as- 
yet unpublished work of Reich and Shaby (2011), 
who develop a finite latent process approximation 
to the Smith process in a Bayesian framework, and 
are thus able to approximate this model closely us- 
ing Markov chain Monte Carlo methods. They are 
also able to incorporate nonstationarity and latent 
process models for the marginal parameters. 

Our rainfall application suggests that there is an 
awkward trade-off to be made in modeling spatial 
extremes. Latent variables allow a realistic and flex- 
ible spatial structure in the marginal distributions 
and thus enable a good assessment of the variation of 
return levels across space, but the spatial structure 
they attribute to extreme events seems quite unre- 
alistic: compare the simulations in Figures 3 and 4. 
It would be worthwhile to investigate the fitting of 
such structures using pairwise likelihood, which is 
the only approach currently available for the fitting 
of the spatially appropriate copula and max-stable 
process models. Ribatet, Cooley and Davison (2012) 
report promising results from an investigation into 
the use of pairwise likelihood in Bayesian inference, 
but it would be good to have a better understanding 
of that approach. 

The connections between copula and max-stable 
models also need more investigation: while the for- 
mer seem to provide the best fits overall — compare 
Tables 4 and 5 — the formulation of the latter in 
terms of a full spatial process is very attractive. Pre- 
sumably the difference is simply a technical matter 
of using a spatially-defined dependence function and 
extending the copula models to the full spatial do- 



main, but the connections are intriguing and merit 
further study. 

Although we have used pairwise likelihood for in- 
ference, it would be worthwhile to investigate whether 
the inclusion of third- and higher-order marginal 
densities in the composite likelihood would increase 
its efficiency. Genton, Ma and Sang (2011) show 
that this increases the efficiency of estimation for 
the Smith model, but so far as we are aware, their 
work has not yet been extended to other max-stable 
models or used in applications. Another way to im- 
prove statistical efficiency while reducing the com- 
putational burden of the composite likelihood could 
be the downweighting or exclusion of likelihood con- 
tributions from sites very far apart, as suggested by 
Bevilacqua et al. (2012) and Padoan, Ribatet and 
Sisson (2010); in the context of time series, includ- 
ing unnecessary pairs can degrade inference (Davis 
and Yau (2011)), and simulations suggest that this 
is also true for certain models for spatial extremes 
(Gholamrezaee (2010); Padoan, Ribatet and Sisson 
(2010)). This is related to the issue of the scalabil- 
ity of the max-stable and extremal copula analyses: 
the combinatorial explosion associated with the use 
of pairwise likelihood might render these infeasible 
for data from thousands of sites. In such cases a ju- 
dicious sub-sampling of pairs seems necessary, but 
our expectation is that inference should be feasible 
in such settings. 

We apply our ideas to block maxima, essentially 
because this seems to be the only extremal setting 
for which spatial methods are currently available, 
but the extension to threshold modeling (Davison 
and Smith (1990); Coles and Tawn (1991)) would 
enable more flexible inference. Encouraging results 
for spatio-temporal modeling of rain data have been 
obtained in Huser and Davison (2012), and further 
exploration of related ideas, for example, due to 
Turkman, Turkman and Pereira (2010), seems emi- 
nently worthwhile. 

Throughout the discussion above we have sup- 
posed that the classical theory of extremes provides 
appropriate models for maxima, and, in particular, 
that the extremal dependence observed in the data 
can be extrapolated to higher levels for which ob- 
servations are unavailable. In practice, dependence 
is often seen to decrease for increasingly rare events, 
suggesting inadequacies in the classical formulation. 
The development of models for so-called near-inde- 
pendence (Ledford and Tawn, 1996, 1997; Heffernan 
and Tawn (2004); Ramos and Ledford (2009)) of 
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spatial extremal data would be very valuable. Wads- 
worth and Tawn (2012) tackle this important topic. 

APPENDIX: MCMC ALGORITHM FOR 
LATENT VARIABLE MODEL 

Inference for our latent variable model may be 
performed using a Gibbs sampler, whose steps we 
now describe. Given a current value of the Markov 
chain 

the next state ipt+i of the chain is obtained as fol- 
lows. 

Step 1: Updating the GEV parameters at each site. 
Each component of r\ t = {rj t (xi), . . . ,rj t (x£))} is up- 
dated singly according to the following scheme. Gen- 
erate a proposal i] p (x d ) from a symmetric random 
walk and compute the acceptance probability 

a{rit{x d ),r] p (x d )} 

= min{l,7r{y d | r] p (x d ), T t (x d ),£t{xd)} 

x ir(r) p | a v , Xr,, (3 V ) 

/{n{y d | m(x d ),T t (x d ),^ t (x d )} 

x ir(r) t | a v ,Xr,,(3 v ))}, 

that is, a ratio of GEV likelihoods times a ratio of 
multivariate Normal likelihoods. With probability 
ct{i] t (x d ),r] p (x d )}, the r](x d ) component of ij) t+1 is 
set to ij p (x d ); otherwise it remains at rjt(x d ). The 
scale and shape parameters are updated similarly. 

Step 2: Updating the regression parameters. Due 
to the use of conjugate priors, j3 v is drawn directly 
from a multivariate Normal distribution having co- 
variance matrix and mean vector 

{(£*) 1 + X^S^ 1 X r; } 1 , 

{(s;)- 1 + xjE-^-Hc^rv; + xts- 1 ^}, 

where /i* and S* are the mean vector and covari- 
ance matrix of the prior distribution for /3 and X^ 
is the design matrix related to the regression coeffi- 
cients (3^. Again the regression parameters for the 
GEV scale and shape parameters are updated simi- 
larly. 

Step 3: Updating the sill parameters of the covari- 
ance function. Due to the use of conjugate priors, a v 
is drawn directly from an inverse Gamma distribu- 



tion whose shape and rate parameters are 
\k + K a , 

e* av + la v>t (rj t - X v (3, ht ) T ^-}(rj t - X v f3 v , t ), 

where k* and #* are respectively the shape and 
scale parameters of the inverse Gamma prior distri- 
bution and X^ is the design matrix related to the 
regression coefficients (3^ . The sill parameters of the 
covariance function for the GEV scale and shape 
parameters are updated similarly. 

Step 4 : Updating the range parameters of the co- 
variance function. Generate a proposal X„ p ~ 
U(X v ,t — £\-,X v ,t + ca) and compute the acceptance 
probability 

&{X v ,t, X V:P ) 

= mm< 1, — : 

a ratio of multivariate Normal densities times the 
ratio of the prior densities and where kX and OX 
are respectively the shape and the scale parame- 
ters of the Gamma prior distribution. With prob- 
ability a(\ V: t, X v , p ), the component of i/'t+i is se t 
to A^. p ; otherwise it remains at A«t. The range pa- 
rameters related to the scale and shape GEV param- 
eters are updated similarly. If the covariance family 
has a shape parameter like the powered exponential 
or the Whittle-Matern covariance functions, this is 
updated in the same way. 
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